Using Pilot Systems to Execute Many Task Workloads on Supercomputers

نویسندگان

  • Andre Merzky
  • Matteo Turilli
  • Manuel Maldonado
  • Mark Santcroos
  • Shantenu Jha
چکیده

Traditionally high-performance computing (HPC) systems have been optimized to support mostly monolithic workloads. The workload of many important scientific applications however, is comprised of spatially and temporally heterogeneous tasks that are often dynamically inter-related. These workloads can benefit from being executed at scale on HPC resources but a tension exists between their resource utilization requirements and the capabilities of HPC system software and HPC usage policies. Pilot systems have successfully been used to address this tension. In this paper we introduce RADICAL-Pilot (RP), a scalable and interoperable pilot system that faithfully implements the Pilot abstraction. We describe its design and characterize the performance of its components, as well as its performance on multiple heterogeneous HPC systems. Specifically, we characterize RP’s task execution component (the RP Agent), which is engineered for optimal resource utilization while maintaining the full generality of the Pilot abstraction.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Design and Performance Characterization of RADICAL-Pilot on Titan

Many extreme scale scientific applications have workloads comprised of a large number of individual high-performance tasks. The Pilot abstraction decouples workload specification, resource management, and task execution via job placeholders and late-binding. As such, suitable implementations of the Pilot abstraction can support the collective execution of large number of tasks on supercomputers...

متن کامل

Analysis of Distributed Execution of Workloads

Resource selection and task placement for distributed execution poses conceptual and implementation difficulties. Although resource selection and task placement are at the core of many tools and workflow systems, the models and methods are underdeveloped. Consequently, partial and noninteroperable implementations proliferate. We address both the conceptual and implementation difficulties by exp...

متن کامل

Emerging High Performance Computing Systems and Next Generation Engineering Analysis Applications

This paper provides a high level overview of the intersection between the broad fields of Infrastructure Engineering and Computer Systems Engineering. The last two decades of technical high performance computing (HPC) have been remarkably stable, with high-end scientific and engineering applications able to leverage the increases in performance of commodity processors in massively parallel supe...

متن کامل

Exploring Distributed Resource Allocation Techniques in the SLURM Job Management System

With the exponentially growth of distributed computing systems in both flops and cores, scientific applications are growing more diverse with a variety of workloads. These workloads include traditional large-scale High Performance Computing MPI jobs, and ensemble workloads, such as Many-Task Computing workloads comprised of extremely large number of tasks of finer granularity, where tasks are d...

متن کامل

Guest Editors' Introduction: Special Section on Many-Task Computing

IT is our honor to serve as guest editors of this special section of the IEEE Transactions on Parallel and Distributed Systems (TPDS) on many-task computing (MTC). This section focuses on the methods required to manage and execute large multiple program multiple data (MPMD) computations on large clusters, grids, clouds, and supercomputers. We are pleased to present 10 high-quality contributions...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2015